Splitting Randomized Stationary Policies in Total-Reward Markov Decision Processes

نویسندگان

  • Eugene A. Feinberg
  • Uriel G. Rothblum
چکیده

This paper studies a discrete-time total-reward Markov decision process (MDP) with a given initial state distribution. A (randomized) stationary policy can be split on a given set of states if the occupancy measure of this policy can be expressed as a convex combination of the occupancy measures of stationary policies, each selecting deterministic actions on the given set and coinciding with the original stationary policy outside of this set. For a stationary policy, necessary and sufficient conditions are provided for splitting it at a single state as well as sufficient conditions for splitting it on the whole state space. These results are applied to constrained MDPs. The results are refined for absorbing (including discounted) MDPs with finite state and actions spaces. In particular, this paper provides an efficient algorithm that presents the occupancy measure of a given policy as a convex combination of the occupancy measures of finitely many (stationary) deterministic policies. This algorithm generates the splitting policies in a way that each pair of consecutive policies differs at exactly one state. The results are applied to constrained problems to efficiently compute an optimal policy by computing and splitting a stationary optimal policy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constrained Discounted Dynamic Programming

This paper deals with constrained optimization of Markov Decision Processes with a countable state space, compact action sets, continuous transition probabilities, and upper semi-continuous reward functions. The objective is to maximize the expected total discounted reward for one reward function, under several inequality constraints on similar criteria with other reward functions. Suppose a fe...

متن کامل

OPTIMAL CONTROL OF AVERAGE REWARD MARKOV DECISION PROCESSES ' CONSTRAINED CONTINUOUS - TIME FINITE Eugene

The paper studies optimization of average-reward continuous-time finite state and action Markov Decision Processes with multiple criteria and constraints. Under the standard unichain assumption, we prove the existence of optimal K-switching strategies for feasible problems with K constraints. For switching randomized strategies, the decisions depend on the current state and the the time spent i...

متن کامل

Finite-Horizon Markov Decision Processes with State Constraints

Markov Decision Processes (MDPs) have been used to formulate many decision-making problems in science and engineering. The objective is to synthesize the best decision (action selection) policies to maximize expected rewards (minimize costs) in a given stochastic dynamical environment. In many practical scenarios (multi-agent systems, telecommunication, queuing, etc.), the decision-making probl...

متن کامل

Value Iteration and Action 2-Approximation of Optimal Policies in Discounted Markov Decision Processes

It is well-known that in Markov Decision Processes, with a total discounted reward, for instance, it is not always possible to explicitly find the optimal stationary policy f∗. But using the Value Iteration, a stationary policy fN such that the optimal discounted rewards of f∗ and fN are close, for the N -th iteration of the procedure, a question arises: are the actions f∗(x) and fN (x) necessa...

متن کامل

Continuous Time Discounted Jump Markov Decision Processes: A Discrete-Event Approach

This paper introduces and develops a new approach to the theory of continuous time jump Markov decision processes (CTJMDP). This approach reduces discounted CTJMDPs to discounted semi-Markov decision processes (SMDPs) and eventually to discrete-time Markov decision processes (MDPs). The reduction is based on the equivalence of strategies that change actions between jumps and the randomized stra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Math. Oper. Res.

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2012